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Introduction 

Multi-agent  competitive  and  cooperative  systems  occur  in  a  vast  range  of  designed 
and  natural  settings  such  as  communication,  economic,  energy  and  transportation 
systems.  However,  the  complexity  of  such  large  population  stochastic  dynamic 
systems,  and  frequently  their  inherent  nature,  make  centralized  control  infeasible  or 
irrelevant  and  standard  game  theoretic  analysis  intractable.  The  key  idea  of  Mean 
Field  (MF)  stochastic  control  (or  Nash  Certainty  Equivalence  (NCE)  control)  is  that 
when  the  agent  population  is  very  large  individual  feedback  strategies  exist  for  all 
of  the  agents  so  that  each  agent  will  be  in  an  approximate  Nash  equilibrium  with 
with  the  pre-computable  behaviour  of  the  mass  of  other  agents.  During  the  period  of 
receipt  of  this  award  significant  progress  in  this  research  area  has  been  made  in  this 
subject  area.  In  particular,  as  planned,  the  following  topics  have  been  analyzed  and 
computational  methodologies  generated:  (i)  a  fundamental  theory  and 
computational  methodology  of  adaptive  MF  Stochastic  systems,  (ii)  a  MF  Games 
theory  of  Consensus  Systems,  (iii)  MF  Leader-Follower  Systems,  (iv)  applications  to 
power  markets,  and  (v)  a  fundamental  analysis  of  non-linear  Mean  Field  Systems 
with  Major  and  Minor  players. 

In  this  Final  Report  the  main  sections  are  devoted  to  the  five  topics  listed  above. 
This  comprehensive  summary  of  the  significant  work  accomplished  within  this 
research  program  is  followed  by  bibliography  of  the  publications  which  have  been 
generated  by  the  PI  and  his  collaborators. 

1 .  Stochastic  Adaptive  Mean  Field  Control 

The  inclusion  of  learning  procedures  for  the  identification  by  a  given  agent  of  the 
dynamical  and  cost  function  parameters  of  other  competing  agents  in  a  stochastic 
dynamic  system,  or  of  the  statistical  distribution  of  these  parameters  in  a  mass  of 
competing  agents,  introduces  new  features  into  the  system  theoretic  NCE  (MF) 
setup. 

The  natural  initial  problem  in  the  development  of  adaptive  MF  stochastic  system 
theory  is  that  where  each  agent  needs  to  estimate  its  own  dynamical  parameters, 
while  its  control  actions  are  permitted  to  be  explicit  functions  of  the  parameter 
distribution  of  the  entire  population  of  competing  agents.  A  subsequent  problem  is 


the  generalization  where  each  agent  also  needs  to  estimate  the  distribution  of  the 
population's  dynamical  parameters,  and  a  natural  further  generalization  is  the  case 
where  the  cost  function  parameters  also  vary  over  the  population  and  this 
distribution  is  unknown  to  each  agent  and  hence  needs  to  be  estimated.  In  this 
paper  we  provide  a  solution  to  the  most  general  problem  in  this  sequence. 

The  work  in  this  program  under  this  heading  provides  a  study  of  the  mean  field 
stochastic  adaptive  optimal  control  problem  where  the  cost  functions  of  the  agents 
in  a  population  are  coupled,  and  each  agent  estimates  its  own  dynamical  parameters 
based  upon  observations  of  its  own  trajectory,  and  furthermore  estimates  the 
distribution  parameter  of  the  population's  dynamical  and  cost  function  parameters 
by  observing  a  randomly  chosen  fraction  of  the  population.  This  work  makes  a 
contribution  to  the  mean  field  literature  by  extending  the  established  epsilon-Nash 
equilibrium  results  of  a  large  population  of  egoistic  agents  to  a  large  population  of 
adaptive  egoistic  agents.  The  information  requirement  for  each  agent  is  kept  limited 
in  the  sense  that  the  distribution  of  the  dynamical  parameters  of  the  population  is 
estimated  only  through  a  fraction  of  the  population  which  becomes  negligible  as  the 
population  size  grows  to  infinity.  The  strong  consistency  of  the  self-parameter 
estimates  and  the  population  distribution  function  parameter  estimates,  the 
stability  of  the  system,  and  an  epsilon-Nash  Equilibrium  property  are  all 
established  in  this  analysis. 

All  of  the  work  on  stochastic  adaptive  MFG  control  has  either  been  published  in 
journals,  presented  at  conferences  or  has  been  accepted  for  publication  (see  the  list 
of  papers  in  Subsection  1  of  the  References  section). 

2.  Mean  Field  Theory  of  Consensus  and  Social  Systems 

A  consensus  process  is  the  process  of  dynamically  reaching  an  agreement  between 
the  agents  of  a  group  on  some  common  state  properties  such  as  position  or  velocity. 
The  formulation  of  consensus  systems  is  one  of  the  important  issues  in  the  area  of 
multi-agent  control  and  coordination,  and  has  been  an  active  area  of  research  in  the 
systems  and  control  community  over  the  past  few  years. 

In  the  Mean  Field  (MF)  dynamic  game  consensus  model  considered  in  this  work: 

(i)  each  agent  has  a  priori  information  on  the  initial  state  distribution  mean  of  the 
overall  population,  (ii)  the  set  of  MF  control  laws  possesses  an  epsilon-Nash 
equilibrium  property,  (iii)  the  system  of  agents  reaches  consensus  and  does  not 
require  communication  with  other  agents.  Whereas  in  the  Standard  Consensus  (SC) 
algorithms:  (i)  agents  need  no  a  priori  information  on  the  initial  state  distribution  of 
the  overall  population  but  require  local  communication  with  other  agents,  (ii) 
consensus  can  be  achieved  if  the  union  of  the  interaction  graphs  for  the  system  is 
connected  frequently  enough  as  the  system  evolves.  Furthermore,  in  the 
deterministic  problem  formulation,  we  show  that  a  finite  population  system  with 
the  observation  feedback  algorithm  reaches  consensus  on  the  initial  state 
distribution  mean  as  time  and  population  size  N  go  to  infinity. 


In  the  deterministic  problem  formulation  for  a  finite  population  system,  we  show 
that  the  LRA  cost  of  each  individual  at  the  MF  Nash  equilibrium,  the  minimal  LRA 
social  cost  with  decentralized  MF  strategies  and  the  minimal  LRA  social  cost  with 
centralized  information  are  equal  to  zero.  However,  the  transient  solutions  of  these 
social  optimal  strategies  will  in  general  be  different.  The  SC  algorithms  require 
global  communication  with  other  agents  (or  local  communication  with  neighbors  in 
the  random  and  time-varying  network  topologies)  in  the  system  and  for  large  N  this 
leads  to  high  communication  and  computational  complexity.  On  the  other  hand,  the 
decentralized  social  MF  control  laws  do  not  require  even  local  communication  and 
hence  are  robust  with  respect  to  communication  network  failures,  but  to  gain  this 
property  a  priori  information  on  the  mean  of  the  system's  initial  state  distribution 
must  be  available  to  each  agent  (see  the  papers  and  conference  items  in  References 
Subsection  2).  This  problem  formulation  further  applies  to  Cucker-Smale  type 
flocking  systems  (see  References  Subsection  2). 

Linked  to  this  work  is  the  studies  carried  out  of  the  emergence  of  coalitions  in  MF 
systems  and  of  the  properties  of  mixed  populations  of  major-minor  and  egoist- 
altruist  MF  games  systems,  and  moreover,  the  investigations  of  the  relationship 
between  competitive  game  theoretic  MF  system  behaviour  and  social  cooperative 
behaviour  (again  see  the  papers  and  conference  items  in  References  Subsection  2). 


3.  Mean  Field  Leader-Follower  Systems 

In  this  set  of  work  a  game  theory  based  model  of  collective  dynamics  has  been 
produced  which  includes  leaders,  followers  and  a  reference  trajectory  to  be  tracked. 
The  mean  field  equations  characterizing  the  Nash  equilibrium  for  infinite  population 
systems  were  derived,  and  under  appropriate  conditions,  they  have  a  unique 
solution  leading  to  decentralized  control  laws.  Furthermore,  for  large  but  finite 
population  systems,  such  controls  were  shown  to  correspond  to  so-called  epsilon- 
Nash  equilibria. 

The  computation  of  the  followers'  control  laws  requires  knowledge  of  the  complete 
reference  trajectory  of  the  leaders  which  is  in  general  not  known  to  the  followers.  In 
response  to  this  we  have  proposed  and  tested  a  reference  trajectory  likelihood  ratio 
based  adaptation  scheme  based  on  noisy  observations  by  followers  of  a  random 
sample  of  leaders.  Under  appropriate  identifiability  conditions,  it  is  established  that 
this  identification  scheme  is  able  to  select  the  exact  reference  trajectory  model 
within  a  finite  class  of  candidates  in  a  finite  deterministic  time  almost  surely  as  the 
number  of  samples  goes  to  infinity.  As  a  result,  the  two  phase  (estimation  based) 
adaptive  mean  field  control  laws  of  the  followers  together  with  the  mean  field 
control  laws  of  the  leaders  give  rise  to  a  dynamic  stochastic  Nash  equilibrium  for  the 


overall  leader-follower  system  (see  the  papers  and  conference  items  in  References 
Subsection  3.) 


4.  Mean  Field  Theory  Applications  to  Power  Markets 

The  term  "smart  grid"  refers  to  the  incorporation  of  recent  advances  in 
communication  and  computation  into  the  grid;  this  is  in  order  to  increase  the 
connectivity,  automation  and  coordination  among  these  suppliers,  consumers  and 
networks  which  perform  transmission  or  distribution  tasks.  The  long-term  goal  is  to 
formulate  a  power  system  model  and  associated  control  laws  where  highly 
intermittent  suppliers  can  be  accommodated,  peak  demand  can  be  reduced,  and 
dependency  on  polluting  fossil  fuels  with  their  volatile  price  can  be  decreased. 

One  of  the  important  innovations  that  the  smart  grid  offers  is  the  replacement  of 
analog  mechanical  meters  with  smart  meters,  that  is  to  say  digital  meters  which 
have  fast  transmission  capabilities  that  can  carry  instant  information  such  as  the 
locational  marginal  price  of  the  grid,  and  have  a  certain  amount  of  computation 
power.  Even  though  technological  advances  enable  the  use  of  smart  meters  which 
facilitates  the  demand  response  mechanism,  there  are  several  issues  to  be 
considered  before  these  useful  devices  can  be  integrated  in  the  large  scale, 
algorithm  where  the  consumers  and  suppliers  only  follow  the  price  signal  measured 
from  the  smart  meters  and  have  statistical  information  measured  from  the  entire 
population.  We  propose  a  decentralized  algorithm  that  gives  the  best  response 
action  for  each  agent  for  an  infinite  population.  The  algorithm  guarantees  an 
epsilon-Nash  equilibrium  and  stability  for  the  finite  population  system  under  the 
strong  assumption  that  the  population  dynamical  parameter  distribution  is  perfectly 
known  to  all  agents  in  the  system.  The  model  in  this  paper  is  highly  stylized  in  order 
to  obtain  analytical  tractability,  and  some  of  the  assumptions  are  hard  to  satisfy  in  a 
real  power  market.  However,  most  of  these  assumptions  can  be  softened  by  either 
incorporating  numerical  analysis  techniques  to  solve  the  partial  differential 
equations  or  by  data  driven  analysis  in  order  to  capture  the  instantaneous 
population  parameter  distributions,  etc. 

The  price  trajectory  forecast  carries  valuable  information  for  consumers,  suppliers 
and  the  operators.  For  the  suppliers  and  consumers  it  gives  the  opportunity  to 
decide  to  startup  or  shutdown  its  current  load  or  supply,  which  is  a  costly  action. 

For  the  operators  it  gives  a  chance  to  make  an  action  in  case  of  an  oscillating  or 
volatile  price  trajectory  forecast.  Mean  Field  (MF)  Stochastic  Systems  Best  response 
calculation  in  a  dynamic  large  population  game  requires  complete  state 
observations  on  the  population  for  each  agent.  The  complexity  becomes  intractable 
as  the  population  gets  larger.  For  these  problems,  the  mean  field  framework 


provides  decentralized  strategies  that  yield  Nash  equilibria  in  the  asymptotic  limit 
of  an  infinite  (mass)  population.  The  control  laws  use  only  the  local  information  of 
each  agent  on  its  own  state  and  own  dynamical  parameters,  while  the  mass  effect  is 
calculated  offline  using  statistical  information.  These  laws  yield  approximate 
equilibria  when  applied  in  the  finite  population  (see  the  Reference  Subsection  4). 

In  this  work  the  mass  effect  has  two  components,  namely  the  consumer  and  supplier 
masses.  These  two  population  partitions  have  different  characteristics;  however,  in 
the  population  limit,  the  mean  field  equations  provide  the  smooth  deterministic 
joint  mass  effect. 


5.  Non-linear  Mean  Field  Systems  with  Major  and  Minor  Players 

This  work  studies  a  stochastic  mean  field  system  for  a  class  of  dynamic  games 
involving  nonlinear  stochastic  dynamical  systems  with  major  and  minor  (MM) 
agents.  The  SMF  system  consists  of  coupled  (i)  backward  in  time  stochastic 
Hamilton-Jacobi-Bellman  equations,  and  (ii)  forward  in  time  stochastic  McKean 
Vlasov  or  stochastic  Fokker-Planck-Kolmogorov  equations.  Existence  and 
uniqueness  of  the  solution  to  the  MM-SMF  system  is  established  by  a  fixed  point 
argument  in  the  Wasserstein  space  of  random  probability  measures.  In  the  case  that 
minor  agents  are  coupled  to  the  major  agent  only  through  their  cost  functions,  the 
epsilon-Nash  equilibrium  property  of  the  SMF  best  response  control  possess  is 
shown  for  a  finite  N  population  system  where  epsilon_N=0(l/NA(l/2)).  As  a 
particular  but  important  case,  the  results  of  Nguyen  and  Huang  (2011)  for  MM 
stochastic  mean  field  linear-quadratic-Gaussian  systems  with  homogeneous 
population  are  retrieved,  and,  in  addition,  the  results  of  this  work  are  illustrated 
with  a  major  and  minor  agent  version  of  a  game  model  of  the  synchronization  of 
coupled  nonlinear  oscillators  (see  the  Reference  Subsection  5). 
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